Clustering Methods For Spatial Datamining

نویسنده

  • Dimitrios Gunopulos
چکیده

Abstra t We investigate the use of biased sampling a ording to the density of the dataset, to speed up the operation of general data mining tasks, su h as lustering and outlier dete tion in large multidimensional datasets. In density-biased sampling, the probability that a given point will be in luded in the sample depends on the lo al density of the dataset. We propose a general te hnique for density-biased sampling that an fa tor in user requirements to sample for properties of interest, and an be tuned for spe i data mining tasks. This allows great exibility, and improved a ura y of the results over simple random sampling. We des ribe our approa h in detail, we analyti ally evaluate it, and show how it an be optimized for approximate lustering and outlier dete tion. Finally we present a thorough experimental evaluation of the proposed method, applying density-biased sampling on real and syntheti data sets, and employing lustering and outlier dete tion algorithms, thus highlighting the utility of our approa h.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of effective factors in expanding electronic payment in Iran using datamining techniques

E-banking has grown dramatically with the development of ICT industry and banks offer their services to customers from different channels. Nowadays, considering the great economic benefits of electronic banking systems, the need to pay attention to the expansion of electronic banking is increasingly felt in terms of reducing costs and increasing the bank's profitability. The purpose of this stu...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Privacy of Data, Preserving in Data Mining

Huge volume of detailed personal data is regularly collected and sharing of these data is proved to be beneficial for data mining application. Such data include shopping habits, criminal records,medical history, credit records etc .On one hand such data is an important asset to business organization and governments for decision making by analyzing it .On the other hand privacy regulations and o...

متن کامل

Limitations of the SOM and the GTM

Datamining is becoming more and more popular thanks to the rapid development of computers and the need to extract information out of increasingly large data collections. Within datamining one interesting field is to visualize the data to obtain a better understanding. One common approach is clustering with topology preservation, which can be achieved with the very popular Som algorithm. Very si...

متن کامل

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark

Due to the recent emergence Clustering techniques have been widely adopted in many real world data analysis applications, such as customer behavior analysis, targeted marketing, digital forensics, etc. As the satellite imagery is getting generated at a higher rate than the previous decades, it becomes essential to have better solutions in terms of accuracy as well as performance. In this paper,...

متن کامل

Spatial Analysis of COVID-19 and Exploration of Its Environmental and Socio-Demographic Risk Factors Using Spatial Statistical Methods: A Case Study of Iran

Background: Iran detected its first COVID-19 case in February 2020 in Qom province, which rapidly spread to other cities in the country. Iran, as one of those countries with the highest number of infected people, has officially reported 1812 deaths from a total number of 23049 confirmed infected cases that we used in the analysis. Materials and Methods: Geographic distribution by the map of ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002